Characterizing Web Resources for Improved Search
نویسنده
چکیده
As an important initial step to exploit such dimensions for web search, we have focused on geographical relevance. Web sites containing information on restaurants or apartment rentals, for instance, are relevant primarily to web users in geographical proximity to these locations. In contrast, an on-line newspaper may be relevant to users across the United States. We have studied how to mine the web and automatically estimate the geographical scope of web resources by using web hyperlinks and the actual content of web pages. For example, we can map every web page to a location based on where its hosting site resides. Then, we can consider the location of all the pages that point to, say, the Stanford Daily home page. By examining the distribution of these pointers, we can conclude that the Stanford Daily is of interest mainly to residents of the Stanford area, while The Wall Street Journal is of nation-wide interest. Similar conclusions can be drawn for other resources by analyzing the geographical locations that are mentioned in their pages.
منابع مشابه
Image flip CAPTCHA
The massive and automated access to Web resources through robots has made it essential for Web service providers to make some conclusion about whether the "user" is a human or a robot. A Human Interaction Proof (HIP) like Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) offers a way to make such a distinction. CAPTCHA is a reverse Turing test used by Web serv...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملBetter bioinformatics through usability analysis
MOTIVATION Improving the usability of bioinformatics resources enables researchers to find, interact with, share, compare and manipulate important information more effectively and efficiently. It thus enables researchers to gain improved insights into biological processes with the potential, ultimately, of yielding new scientific results. Usability 'barriers' can pose significant obstacles to a...
متن کاملCharacterizing Semantic Relatedness of Search Query Terms
Mining for semantic information in search engine query logs bears great potential for both the optimization of search engines and bootstrapping Semantic Web applications. The interaction of a user with a search engine (more specifically clicklog information) has recently been viewed as implicit tagging of resources by query terms. The resulting structure – previously called a logsonomy – exhibi...
متن کاملDesign and Implementation of a Web directory for Medical Education (WDME): a Tool to Facilitate Research in Medical Education
Introduction: Access to the medical education resources on the web is one of current challenges for researchers and medical science educators. The purpose of current project was to design and implement a comprehensive and specific subject/web directory of medical education. Methods: First, the categories to be incorporated in the directory were defined through reviewing related directories an...
متن کامل